Skip to content

Conversation

@guohdd
Copy link
Contributor

@guohdd guohdd commented Jan 27, 2026

What does this PR do?

This PR adds macOS support for the Software Inventory feature in System Probe. The agent now collects installed software from multiple sources on macOS:

  • applications (/Applications/ and per-user ~/Applications/)
  • Mac App Store installations
  • PKG installer receipts
  • Homebrew formulae and casks
  • MacPorts packages
  • kernel extensions
  • system extensions
    The collected data includes software name, version, install date, publisher, and installation path. Broken installations are detected and flagged with a reason.

Motivation

WINA-1938: [EUDM][SI][Agent] Collect Software Inventory on MacOS

Describe how you validated your changes

Verification via local testing and agent status. Ensure all pipelines pass.
Local test command: go test ./pkg/inventory/software/... -v

Additional Notes

@guohdd guohdd added the qa/done QA done before merge and regressions are covered by tests label Jan 27, 2026
@github-actions github-actions bot added component/system-probe long review PR is complex, plan time to review it team/windows-products and removed qa/done QA done before merge and regressions are covered by tests labels Jan 27, 2026
@agent-platform-auto-pr
Copy link
Contributor

agent-platform-auto-pr bot commented Jan 27, 2026

Static quality checks

✅ Please find below the results from static quality gates
Comparison made with ancestor 79bc07c
📊 Static Quality Gates Dashboard

Successful checks

Info

Quality gate Change Size (prev → curr → max)
agent_deb_amd64 N/A N/A → 747.027 → 754.830
agent_deb_amd64_fips N/A N/A → 707.048 → 715.320
agent_heroku_amd64 N/A N/A → 324.176 → 329.530
agent_msi N/A N/A → 658.970 → 1072.620
agent_rpm_amd64 N/A N/A → 747.011 → 754.800
agent_rpm_amd64_fips N/A N/A → 707.032 → 715.310
agent_rpm_arm64 N/A N/A → 726.188 → 737.340
agent_rpm_arm64_fips N/A N/A → 688.705 → 698.930
agent_suse_amd64 N/A N/A → 747.011 → 754.800
agent_suse_amd64_fips N/A N/A → 707.032 → 715.310
agent_suse_arm64 N/A N/A → 726.188 → 737.340
agent_suse_arm64_fips N/A N/A → 688.705 → 698.930
docker_agent_amd64 N/A N/A → 809.510 → 817.140
docker_agent_arm64 N/A N/A → 813.285 → 824.020
docker_agent_jmx_amd64 N/A N/A → 1000.421 → 1008.020
docker_agent_jmx_arm64 N/A N/A → 992.979 → 1003.620
docker_cluster_agent_amd64 N/A N/A → 180.881 → 181.200
docker_cluster_agent_arm64 N/A N/A → 196.746 → 198.490
docker_cws_instrumentation_amd64 N/A N/A → 7.135 → 7.180
docker_cws_instrumentation_arm64 N/A N/A → 6.689 → 6.920
docker_dogstatsd_amd64 N/A N/A → 38.449 → 39.380
docker_dogstatsd_arm64 N/A N/A → 36.812 → 37.940
dogstatsd_deb_amd64 N/A N/A → 29.669 → 30.610
dogstatsd_deb_arm64 N/A N/A → 27.833 → 29.110
dogstatsd_rpm_amd64 N/A N/A → 29.669 → 30.610
dogstatsd_suse_amd64 N/A N/A → 29.669 → 30.610
iot_agent_deb_amd64 N/A N/A → 42.810 → 43.290
iot_agent_deb_arm64 N/A N/A → 39.923 → 40.920
iot_agent_deb_armhf N/A N/A → 40.490 → 41.030
iot_agent_rpm_amd64 N/A N/A → 42.811 → 43.290
iot_agent_suse_amd64 N/A N/A → 42.811 → 43.290
On-wire sizes (compressed)
Quality gate Change Size (prev → curr → max)
agent_deb_amd64 N/A N/A → 182.674 → 184.810
agent_deb_amd64_fips N/A N/A → 174.173 → 177.560
agent_heroku_amd64 N/A N/A → 87.137 → 88.450
agent_msi N/A N/A → 142.484 → 143.300
agent_rpm_amd64 N/A N/A → 185.710 → 188.160
agent_rpm_amd64_fips N/A N/A → 176.453 → 178.900
agent_rpm_arm64 N/A N/A → 168.267 → 169.930
agent_rpm_arm64_fips N/A N/A → 160.590 → 163.120
agent_suse_amd64 N/A N/A → 185.710 → 188.160
agent_suse_amd64_fips N/A N/A → 176.453 → 178.900
agent_suse_arm64 N/A N/A → 168.267 → 169.930
agent_suse_arm64_fips N/A N/A → 160.590 → 163.120
docker_agent_amd64 N/A N/A → 274.748 → 277.400
docker_agent_arm64 N/A N/A → 262.305 → 266.040
docker_agent_jmx_amd64 N/A N/A → 343.402 → 346.020
docker_agent_jmx_arm64 N/A N/A → 326.935 → 330.660
docker_cluster_agent_amd64 N/A N/A → 63.884 → 64.510
docker_cluster_agent_arm64 N/A N/A → 60.162 → 61.170
docker_cws_instrumentation_amd64 N/A N/A → 2.994 → 3.330
docker_cws_instrumentation_arm64 N/A N/A → 2.726 → 3.090
docker_dogstatsd_amd64 N/A N/A → 14.877 → 15.820
docker_dogstatsd_arm64 N/A N/A → 14.219 → 14.830
dogstatsd_deb_amd64 N/A N/A → 7.838 → 8.790
dogstatsd_deb_arm64 N/A N/A → 6.727 → 7.710
dogstatsd_rpm_amd64 N/A N/A → 7.850 → 8.800
dogstatsd_suse_amd64 N/A N/A → 7.850 → 8.800
iot_agent_deb_amd64 N/A N/A → 11.226 → 12.040
iot_agent_deb_arm64 N/A N/A → 9.599 → 10.450
iot_agent_deb_armhf N/A N/A → 9.795 → 10.620
iot_agent_rpm_amd64 N/A N/A → 11.245 → 12.060
iot_agent_suse_amd64 N/A N/A → 11.245 → 12.060

@guohdd guohdd added the qa/done QA done before merge and regressions are covered by tests label Jan 27, 2026
@cit-pr-commenter-54b7da
Copy link

cit-pr-commenter-54b7da bot commented Jan 27, 2026

Regression Detector

Regression Detector Results

Metrics dashboard
Target profiles
Run ID: 7b6fb753-2f2d-43ab-be77-a901cdb6a1a8

Baseline: 0134072
Comparison: 0f681cb
Diff

Optimization Goals: ✅ Improvement(s) detected

perf experiment goal Δ mean % Δ mean % CI trials links
tcp_syslog_to_blackhole ingress throughput +17.89 [+17.78, +18.01] 1 Logs
quality_gate_logs % cpu utilization -42.99 [-44.29, -41.70] 1 Logs bounds checks dashboard

Experiments ignored for regressions

Regressions in experiments with settings containing erratic: true are ignored.

perf experiment goal Δ mean % Δ mean % CI trials links
docker_containers_cpu % cpu utilization -6.78 [-9.80, -3.76] 1 Logs

Fine details of change detection per experiment

perf experiment goal Δ mean % Δ mean % CI trials links
tcp_syslog_to_blackhole ingress throughput +17.89 [+17.78, +18.01] 1 Logs
tcp_dd_logs_filter_exclude ingress throughput +0.00 [-0.09, +0.09] 1 Logs
uds_dogstatsd_to_api ingress throughput -0.01 [-0.14, +0.12] 1 Logs
uds_dogstatsd_to_api_v3 ingress throughput -0.01 [-0.14, +0.12] 1 Logs
file_to_blackhole_100ms_latency egress throughput -0.38 [-0.42, -0.34] 1 Logs
file_to_blackhole_1000ms_latency egress throughput -0.48 [-0.89, -0.06] 1 Logs
file_to_blackhole_0ms_latency egress throughput -0.51 [-1.17, +0.15] 1 Logs
file_to_blackhole_500ms_latency egress throughput -0.54 [-0.90, -0.17] 1 Logs
ddot_metrics memory utilization -0.78 [-1.01, -0.56] 1 Logs
ddot_logs memory utilization -0.79 [-0.86, -0.73] 1 Logs
ddot_metrics_sum_cumulativetodelta_exporter memory utilization -1.03 [-1.26, -0.80] 1 Logs
ddot_metrics_sum_cumulative memory utilization -1.12 [-1.28, -0.95] 1 Logs
ddot_metrics_sum_delta memory utilization -1.40 [-1.61, -1.19] 1 Logs
quality_gate_idle_all_features memory utilization -2.16 [-2.20, -2.12] 1 Logs bounds checks dashboard
otlp_ingest_logs memory utilization -2.31 [-2.40, -2.21] 1 Logs
docker_containers_memory memory utilization -2.52 [-2.59, -2.45] 1 Logs
file_tree memory utilization -2.83 [-2.88, -2.77] 1 Logs
quality_gate_idle memory utilization -2.87 [-2.91, -2.83] 1 Logs bounds checks dashboard
uds_dogstatsd_20mb_12k_contexts_20_senders memory utilization -3.56 [-3.61, -3.50] 1 Logs
otlp_ingest_metrics memory utilization -3.57 [-3.72, -3.42] 1 Logs
quality_gate_metrics_logs memory utilization -4.87 [-5.07, -4.67] 1 Logs bounds checks dashboard
docker_containers_cpu % cpu utilization -6.78 [-9.80, -3.76] 1 Logs
quality_gate_logs % cpu utilization -42.99 [-44.29, -41.70] 1 Logs bounds checks dashboard

Bounds Checks: ✅ Passed

perf experiment bounds_check_name replicates_passed links
docker_containers_cpu simple_check_run 10/10
docker_containers_memory memory_usage 10/10
docker_containers_memory simple_check_run 10/10
file_to_blackhole_0ms_latency lost_bytes 10/10
file_to_blackhole_0ms_latency memory_usage 10/10
file_to_blackhole_1000ms_latency lost_bytes 10/10
file_to_blackhole_1000ms_latency memory_usage 10/10
file_to_blackhole_100ms_latency lost_bytes 10/10
file_to_blackhole_100ms_latency memory_usage 10/10
file_to_blackhole_500ms_latency lost_bytes 10/10
file_to_blackhole_500ms_latency memory_usage 10/10
quality_gate_idle intake_connections 10/10 bounds checks dashboard
quality_gate_idle memory_usage 10/10 bounds checks dashboard
quality_gate_idle_all_features intake_connections 10/10 bounds checks dashboard
quality_gate_idle_all_features memory_usage 10/10 bounds checks dashboard
quality_gate_logs intake_connections 10/10 bounds checks dashboard
quality_gate_logs lost_bytes 10/10 bounds checks dashboard
quality_gate_logs memory_usage 10/10 bounds checks dashboard
quality_gate_metrics_logs cpu_usage 10/10 bounds checks dashboard
quality_gate_metrics_logs intake_connections 10/10 bounds checks dashboard
quality_gate_metrics_logs lost_bytes 10/10 bounds checks dashboard
quality_gate_metrics_logs memory_usage 10/10 bounds checks dashboard

Explanation

Confidence level: 90.00%
Effect size tolerance: |Δ mean %| ≥ 5.00%

Performance changes are noted in the perf column of each table:

  • ✅ = significantly better comparison variant performance
  • ❌ = significantly worse comparison variant performance
  • ➖ = no significant change in performance

A regression test is an A/B test of target performance in a repeatable rig, where "performance" is measured as "comparison variant minus baseline variant" for an optimization goal (e.g., ingress throughput). Due to intrinsic variability in measuring that goal, we can only estimate its mean value for each experiment; we report uncertainty in that value as a 90.00% confidence interval denoted "Δ mean % CI".

For each experiment, we decide whether a change in performance is a "regression" -- a change worth investigating further -- if all of the following criteria are true:

  1. Its estimated |Δ mean %| ≥ 5.00%, indicating the change is big enough to merit a closer look.

  2. Its 90.00% confidence interval "Δ mean % CI" does not contain zero, indicating that if our statistical model is accurate, there is at least a 90.00% chance there is a difference in performance between baseline and comparison variants.

  3. Its configuration does not mark it "erratic".

CI Pass/Fail Decision

Passed. All Quality Gates passed.

  • quality_gate_metrics_logs, bounds check cpu_usage: 10/10 replicas passed. Gate passed.
  • quality_gate_metrics_logs, bounds check lost_bytes: 10/10 replicas passed. Gate passed.
  • quality_gate_metrics_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.
  • quality_gate_metrics_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
  • quality_gate_idle, bounds check memory_usage: 10/10 replicas passed. Gate passed.
  • quality_gate_idle, bounds check intake_connections: 10/10 replicas passed. Gate passed.
  • quality_gate_idle_all_features, bounds check intake_connections: 10/10 replicas passed. Gate passed.
  • quality_gate_idle_all_features, bounds check memory_usage: 10/10 replicas passed. Gate passed.
  • quality_gate_logs, bounds check lost_bytes: 10/10 replicas passed. Gate passed.
  • quality_gate_logs, bounds check intake_connections: 10/10 replicas passed. Gate passed.
  • quality_gate_logs, bounds check memory_usage: 10/10 replicas passed. Gate passed.

@guohdd guohdd marked this pull request as ready for review January 27, 2026 22:35
@guohdd guohdd requested review from a team as code owners January 27, 2026 22:35
Comment on lines 60 to 64
&condaCollector{},
&pipCollector{},
&npmCollector{},
&gemCollector{},
&cargoCollector{},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💭 thought: ‏This is a valuable expansion, but I have a few concerns regarding scope and technical constraints that we should discuss:

Parity & Criteria: This creates a feature imbalance with Windows. It opens the door to a larger discussion: what is our criteria for inclusion? (e.g. If we include these, should we also include NuGet?)

Definition: I'm debating if we should strictly define NPM/Pip packages as "installed software" in the IT sense, or if they are actually artifacts. Since the Software Catalog handles artifacts, are we at risk of product overlap here?

Scale: The backend service was designed around ~1000 packages per host. I'm worried that including these additional packages (which can be numerous) risks breaking our size calculations. Have we validated this against a heavy dev environment?

Copy link
Contributor

@julien-lebot julien-lebot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's sync on this offline. This PR introduces significant changes that could impact service reliability and product offerings, and we need to align on the strategy first.

@guohdd guohdd requested a review from a team as a code owner January 28, 2026 21:15
@guohdd guohdd requested a review from nathan-b January 28, 2026 21:15
@agent-platform-auto-pr
Copy link
Contributor

agent-platform-auto-pr bot commented Jan 28, 2026

Go Package Import Differences

Baseline: 0134072
Comparison: 0f681cb

binaryosarchchange
agentlinuxamd64
+4, -0
+github.com/DataDog/datadog-agent/comp/softwareinventory/def
+github.com/DataDog/datadog-agent/comp/softwareinventory/fx
+github.com/DataDog/datadog-agent/comp/softwareinventory/impl
+github.com/DataDog/datadog-agent/pkg/inventory/software
agentlinuxarm64
+4, -0
+github.com/DataDog/datadog-agent/comp/softwareinventory/def
+github.com/DataDog/datadog-agent/comp/softwareinventory/fx
+github.com/DataDog/datadog-agent/comp/softwareinventory/impl
+github.com/DataDog/datadog-agent/pkg/inventory/software
agentdarwinamd64
+5, -0
+github.com/DataDog/datadog-agent/comp/softwareinventory/def
+github.com/DataDog/datadog-agent/comp/softwareinventory/fx
+github.com/DataDog/datadog-agent/comp/softwareinventory/impl
+github.com/DataDog/datadog-agent/pkg/inventory/software
+github.com/mattn/go-sqlite3
agentdarwinarm64
+5, -0
+github.com/DataDog/datadog-agent/comp/softwareinventory/def
+github.com/DataDog/datadog-agent/comp/softwareinventory/fx
+github.com/DataDog/datadog-agent/comp/softwareinventory/impl
+github.com/DataDog/datadog-agent/pkg/inventory/software
+github.com/mattn/go-sqlite3
iot-agentlinuxamd64
+4, -0
+github.com/DataDog/datadog-agent/comp/softwareinventory/def
+github.com/DataDog/datadog-agent/comp/softwareinventory/fx
+github.com/DataDog/datadog-agent/comp/softwareinventory/impl
+github.com/DataDog/datadog-agent/pkg/inventory/software
iot-agentlinuxarm64
+4, -0
+github.com/DataDog/datadog-agent/comp/softwareinventory/def
+github.com/DataDog/datadog-agent/comp/softwareinventory/fx
+github.com/DataDog/datadog-agent/comp/softwareinventory/impl
+github.com/DataDog/datadog-agent/pkg/inventory/software
heroku-agentlinuxamd64
+4, -0
+github.com/DataDog/datadog-agent/comp/softwareinventory/def
+github.com/DataDog/datadog-agent/comp/softwareinventory/fx
+github.com/DataDog/datadog-agent/comp/softwareinventory/impl
+github.com/DataDog/datadog-agent/pkg/inventory/software
system-probedarwinamd64
+2, -0
+github.com/DataDog/datadog-agent/pkg/inventory/software
+github.com/mattn/go-sqlite3
system-probedarwinarm64
+2, -0
+github.com/DataDog/datadog-agent/pkg/inventory/software
+github.com/mattn/go-sqlite3

@guohdd guohdd requested a review from a team as a code owner January 29, 2026 21:23
@guohdd guohdd requested a review from rahulkaukuntla January 29, 2026 21:23
@guohdd guohdd force-pushed the hongshi/mac_sw_inventory branch from bbef7ba to f1b955f Compare January 30, 2026 14:54
Copy link
Contributor

@rahulkaukuntla rahulkaukuntla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The js file owned by agent configuration LGTM

@guohdd guohdd force-pushed the hongshi/mac_sw_inventory branch from e2cfeb0 to 38f6fa8 Compare February 3, 2026 02:04
core,k8s.io/utils/strings/slices,Apache-2.0,Copyright 2014 The Kubernetes Authors.
core,k8s.io/utils/trace,Apache-2.0,Copyright 2014 The Kubernetes Authors.
core,mellium.im/sasl,BSD-2-Clause,Copyright © 2014 The Mellium Contributors
core,modernc.org/libc,BSD-3-Clause,Bjørn Wiegell <bj.wiegell@gmail.com> | Copyright (c) 2017 The Libc Authors. All rights reserved | Dan Kortschak <dan@kortschak.io> | Dan Peterson <danp@danp.net> | Fabrice Colliot <f.colliot@gmail.com> | Jaap Aarts <jaap.aarts1@gmail.com> | Jan Mercl <0xjnml@gmail.com> | Jason DeBettencourt <jasond17@gmail.com> | Koichi Shiraishi <zchee.io@gmail.com> | Marius Orcsik <marius@federated.id> | Patricio Whittingslow <graded.sp@gmail.com> | Scot C Bontrager <scot@indievisible.org> | Steffen Butzer <steffen(dot)butzer@outlook.com> | W. Michael Petullo <mike@flyn.org> | ZHU Zijia <piggynl@outlook.com>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is a lot of packages added. Was size of the binary increased significantly?

Copy link
Contributor Author

@guohdd guohdd Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great question. I will look into it further as multiple 3rd-party packages have been added since the inception of the PR, so I will need to get back to a proper point to see how much impact could be caused by this PR.


func getPlatformModules() fx.Option {
return fx.Options()
return fx.Options(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure about this file name, should not it have darwin in the name? Does it mean it will be executed on Linux?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This was in response to #45533 (comment)

See: fe8b555

Comment on lines +80 to +84
// First pass: deduplicate entries by ID and format dates
data := map[string]interface{}{}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why there is duplication?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have multiple inventory collectors which could report the same software from different sources, and the same software can appear in multiple locations. For example, Microsoft Word could appear in both application category and package category. The implemented duplication removal is the best effort and might not catch all cases.

Comment on lines 441 to 445
// populatePublishersParallel gets publisher info for multiple entries in parallel
// Note: Getting publisher info requires running external commands (codesign -d) for each app,
// which could be slow for a large number of apps.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth to discuss further. Calling codesign -d 300 times in parallel is not best approach either.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also do we want timeout? If one tries to exit in meantime it will be hanged. Another approach is to cache information if the same data no need to get publishers again.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will leave the codesign approach to the future based on our discussion.


// Don't descend into .app bundles - they're bundles, not folders to scan
// We'll process this .app and then skip its contents
defer func() {}() // The SkipDir is returned at the end
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need empty defer?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good catch. It was added as a placeholder and needs to be removed.

return files
}

// prefetch fetches pkgutil --files for multiple packages in parallel
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need to discuss unbounded parallel execution

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch, will refine the handling.

Comment on lines 49 to 54
// Source indicates the type or source of the software installation
// (e.g., "app", "pkg", "homebrew", "pip"). This field helps categorize
// software by its installation method or distribution channel.
// Placed first for easy identification when scanning JSON output.
Source string `json:"software_type"`

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

❓ question: ‏ Why was this moved at the top of the file? And why did you remove (e.g., "desktop", "msstore", "msu")?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good question. The purpose of moving it to the top is for easy identification when scanning JSON output, but unfortunately it removed some comments for Windows. Based on our discussion, I will rework the representation of the inventory entries.

Comment on lines 117 to 122
// PkgID is the package identifier from the macOS installer receipt database.
// This field is populated when InstallSource is "pkg" and provides a link
// to the corresponding PKG receipt in /var/db/receipts/. This enables
// cross-referencing between application entries and their installation records.
// Example: "com.microsoft.Word" for Microsoft Word installed via PKG.
PkgID string `json:"pkg_id,omitempty"`
Copy link
Contributor

@julien-lebot julien-lebot Feb 5, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// PkgID is the package identifier from the macOS installer receipt database.
// This field is populated when InstallSource is "pkg" and provides a link
// to the corresponding PKG receipt in /var/db/receipts/. This enables
// cross-referencing between application entries and their installation records.
// Example: "com.microsoft.Word" for Microsoft Word installed via PKG.
PkgID string `json:"pkg_id,omitempty"`

This should be stored in the ProductCode field instead of a separate PkgID field.

ProductCode is already defined as "a unique identifier for the software product, often used in package management systems", which is exactly what the macOS PKG ID is. Using a separate field:

  • Creates platform inconsistency (Windows uses ProductCode, macOS uses PkgID)
  • Complicates backend queries (need to check both fields)
  • Duplicates the same concept under different names

The macOS PKG ID (com.microsoft.Word) serves the same purpose as the Windows MSI ProductCode ({GUID}).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PkgID seems to track package installation instead of identifying software. I will review this more based on our discussion.

Comment on lines 136 to 143
// InstallPaths contains the top-level directories where a PKG installed files.
// This field is specific to PKG receipts and provides visibility into where
// the package scattered its files across the filesystem.
// Unlike InstallPath (single path), this captures all installation locations
// for packages that install to multiple directories (e.g., CLI tools that
// install binaries to /usr/local/bin and libraries to /usr/local/lib).
// Examples: ["/usr/local/bin", "/usr/local/ykman", "/Library/LaunchDaemons"]
InstallPaths []string `json:"install_paths,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InstallPaths (plural) doesn't fit the software inventory model:

  1. Display problem: How would the UI render an array of paths in a table row? This would require special handling (expandable rows, tooltips, comma-separated text).

  2. Query complexity: Searching/filtering on array fields (e.g., "show all software installed in /usr/local") requires different query patterns than scalar fields.

  3. Consistency: No other field in the software inventory model is an array. This would be the first, adding complexity for a niche use case (PKG forensics).

Recommendation: Remove InstallPaths and keep only InstallPath (singular). For PKGs that scatter files across the filesystem, choose the most meaningful single path:

  • If the PKG installs an app bundle → use the .app path
  • If it's a CLI tool → use the binary path (e.g., /usr/local/bin/ykman)
  • If there's no clear primary → use the first relevant directory or N/A

A software inventory entry should represent ONE installation with ONE primary location, consistent with how Windows entries work. We can discuss expanding this later to include multiple paths but I feel like this is too much change in one PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, will rework this based on our discussion.

Comment on lines 109 to 115
// InstallSource indicates how the software was installed on macOS.
// Possible values:
// - "pkg": Installed via a .pkg installer package
// - "mas": Installed from the Mac App Store
// - "manual": Installed manually (drag-and-drop from DMG, etc.)
// This field is macOS-specific and helps understand the installation method.
InstallSource string `json:"install_source,omitempty"`
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// InstallSource indicates how the software was installed on macOS.
// Possible values:
// - "pkg": Installed via a .pkg installer package
// - "mas": Installed from the Mac App Store
// - "manual": Installed manually (drag-and-drop from DMG, etc.)
// This field is macOS-specific and helps understand the installation method.
InstallSource string `json:"install_source,omitempty"`

This field overlaps with Source (software_type) and creates inconsistency with Windows:

  • On Windows, software_type captures both category AND install method: desktop, msstore, msu
  • On macOS, you're splitting this into software_type (category: app, pkg, homebrew...) and install_source (method: pkg, mas, manual)

This means the same concept is modeled differently across platforms. Could the macOS install method be encoded in software_type instead, similar to Windows? For example:

  • software_type: "app-pkg" (app installed via PKG)
  • software_type: "app-mas" (app from Mac App Store)
  • software_type: "app-manual" (app dragged to /Applications)

Or if there's a reason to keep them separate, please document why and how the backend/UI should handle the inconsistency and let's implement it in a separate PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great catch. Will focus on parity with Windows implementation based on our discussion.

@julien-lebot
Copy link
Contributor

julien-lebot commented Feb 5, 2026

Please note that any new fields added in this PR will be silently discarded by the backend; the backend currently only extracts the existing 9 fields and ignores any additional ones.

These fields will be collected by the agent but never reach the UI or be queryable:

  • broken_reason
  • install_source
  • pkg_id
  • install_path
  • install_paths

Did you plan to update the backend to support these new fields? If so, backend changes would need to deploy first. Should we defer adding these fields until backend support is ready?

…n dependenc; Parity with Windows implementation;caching pkg info to save CPU resource;using InstallPath internally for deduplication
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants